{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "provenance": [],
      "machine_shape": "hm"
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "language_info": {
      "name": "python"
    },
    "gpuClass": "standard",
    "accelerator": "TPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# 转换并量化中文LLaMA/Alpaca模型\n",
        "\n",
        "🎉🎉🎉 **新：现在免费用户也有机会能够转换7B和13B模型了！**\n",
        "\n",
        "💡 提示和小窍门：\n",
        "- 免费用户默认的内存只有12G左右，**笔者用免费账号实测选择TPU的话有机会随机出35G内存**，建议多试几次。如果能随机出25G内存以上的机器就可以了转换7B模型了，35G内存以上机器就能转换13B模型了\n",
        "- Pro(+)用户请选择 “代码执行程序” -> “更改运行时类型” -> “高RAM”\n",
        "- 实测：转换7B级别模型，25G内存的机器就够了；转换13B级别模型需要30G以上的内存（程序莫名崩掉或断开连接就说明内存爆了）\n",
        "- 如果选了“高RAM”之后内存还是不够大的话，选择以下操作，有的时候会分配出很高内存的机器，祝你好运😄！\n",
        "    - 可以把GPU或者TPU也选上（虽然不会用到）\n",
        "    - 选GPU时，Pro用户可选“高级”类型GPU\n",
        "\n",
        "以下信息配置信息供参考（Pro订阅下测试），运行时规格设置为“高RAM”时的设备配置如下（有随机性）：\n",
        "\n",
        "| 硬件加速器  |  RAM  |  硬盘  |\n",
        "| :-- | :--: | :--: |\n",
        "| None | 25GB | 225GB |\n",
        "| TPU | 35GB | 225GB |\n",
        "| GPU（标准，T4）| 25GB | 166GB |\n",
        "| GPU（高性能，V100）| 25GB | 166GB |\n",
        "| GPU（高性能，A100）| **80GB** | 166GB |\n",
        "\n",
        "*温馨提示：用完之后注意断开运行时，选择满足要求的最低配置即可，避免不必要的计算单元消耗（Pro只给100个计算单元）。*"
      ],
      "metadata": {
        "id": "B1c96_k3MahN"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 安装相关依赖"
      ],
      "metadata": {
        "id": "vScqHD_jMFOV"
      }
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "E5WKFJXIL6ZU",
        "outputId": "442638b1-68de-4edb-f303-0a1bed7dbedd"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting git+https://github.com/huggingface/transformers.git\n",
            "  Cloning https://github.com/huggingface/transformers.git to /tmp/pip-req-build-qrvkcgfv\n",
            "  Running command git clone --filter=blob:none --quiet https://github.com/huggingface/transformers.git /tmp/pip-req-build-qrvkcgfv\n",
            "  Resolved https://github.com/huggingface/transformers.git to commit 4c01231e67f0d699e0236c11178c956fb9753a17\n",
            "  Installing build dependencies ... \u001b[?25l\u001b[?25hdone\n",
            "  Getting requirements to build wheel ... \u001b[?25l\u001b[?25hdone\n",
            "  Preparing metadata (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (1.24.2)\n",
            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (4.65.0)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (2.27.1)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (2022.10.31)\n",
            "Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (23.0)\n",
            "Collecting huggingface-hub<1.0,>=0.11.0\n",
            "  Downloading huggingface_hub-0.13.4-py3-none-any.whl (200 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m200.1/200.1 kB\u001b[0m \u001b[31m6.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (3.10.7)\n",
            "Collecting tokenizers!=0.11.3,<0.14,>=0.11.1\n",
            "  Downloading tokenizers-0.13.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m7.8/7.8 MB\u001b[0m \u001b[31m74.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.9/dist-packages (from transformers==4.28.0.dev0) (6.0)\n",
            "Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.9/dist-packages (from huggingface-hub<1.0,>=0.11.0->transformers==4.28.0.dev0) (4.5.0)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers==4.28.0.dev0) (1.26.15)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers==4.28.0.dev0) (3.4)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers==4.28.0.dev0) (2.0.12)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers==4.28.0.dev0) (2022.12.7)\n",
            "Building wheels for collected packages: transformers\n",
            "  Building wheel for transformers (pyproject.toml) ... \u001b[?25l\u001b[?25hdone\n",
            "  Created wheel for transformers: filename=transformers-4.28.0.dev0-py3-none-any.whl size=6911490 sha256=c1484b91f5ef8feccdfa356e11e61d5f16be66f44c10ee8370398bb3a969f907\n",
            "  Stored in directory: /tmp/pip-ephem-wheel-cache-sa1w67v2/wheels/f7/92/8c/752ff3bfcd3439805d8bbf641614da38ef3226e127ebea86ee\n",
            "Successfully built transformers\n",
            "Installing collected packages: tokenizers, huggingface-hub, transformers\n",
            "Successfully installed huggingface-hub-0.13.4 tokenizers-0.13.3 transformers-4.28.0.dev0\n",
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting peft\n",
            "  Downloading peft-0.2.0-py3-none-any.whl (40 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m40.3/40.3 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.9/dist-packages (from peft) (23.0)\n",
            "Requirement already satisfied: transformers in /usr/local/lib/python3.9/dist-packages (from peft) (4.28.0.dev0)\n",
            "Requirement already satisfied: pyyaml in /usr/local/lib/python3.9/dist-packages (from peft) (6.0)\n",
            "Requirement already satisfied: psutil in /usr/local/lib/python3.9/dist-packages (from peft) (5.9.4)\n",
            "Requirement already satisfied: torch>=1.13.0 in /usr/local/lib/python3.9/dist-packages (from peft) (2.0.0+cu118)\n",
            "Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.9/dist-packages (from peft) (1.24.2)\n",
            "Collecting accelerate\n",
            "  Downloading accelerate-0.18.0-py3-none-any.whl (215 kB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m215.3/215.3 kB\u001b[0m \u001b[31m6.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hRequirement already satisfied: jinja2 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.1.2)\n",
            "Requirement already satisfied: networkx in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.0)\n",
            "Requirement already satisfied: triton==2.0.0 in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (2.0.0)\n",
            "Requirement already satisfied: filelock in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (3.10.7)\n",
            "Requirement already satisfied: sympy in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (1.11.1)\n",
            "Requirement already satisfied: typing-extensions in /usr/local/lib/python3.9/dist-packages (from torch>=1.13.0->peft) (4.5.0)\n",
            "Requirement already satisfied: lit in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (16.0.0)\n",
            "Requirement already satisfied: cmake in /usr/local/lib/python3.9/dist-packages (from triton==2.0.0->torch>=1.13.0->peft) (3.25.2)\n",
            "Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2022.10.31)\n",
            "Requirement already satisfied: requests in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (2.27.1)\n",
            "Requirement already satisfied: huggingface-hub<1.0,>=0.11.0 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.4)\n",
            "Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (4.65.0)\n",
            "Requirement already satisfied: tokenizers!=0.11.3,<0.14,>=0.11.1 in /usr/local/lib/python3.9/dist-packages (from transformers->peft) (0.13.3)\n",
            "Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.9/dist-packages (from jinja2->torch>=1.13.0->peft) (2.1.2)\n",
            "Requirement already satisfied: charset-normalizer~=2.0.0 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2.0.12)\n",
            "Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (3.4)\n",
            "Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (2022.12.7)\n",
            "Requirement already satisfied: urllib3<1.27,>=1.21.1 in /usr/local/lib/python3.9/dist-packages (from requests->transformers->peft) (1.26.15)\n",
            "Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.9/dist-packages (from sympy->torch>=1.13.0->peft) (1.3.0)\n",
            "Installing collected packages: accelerate, peft\n",
            "Successfully installed accelerate-0.18.0 peft-0.2.0\n",
            "Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/\n",
            "Collecting sentencepiece\n",
            "  Downloading sentencepiece-0.1.97-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m19.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25hInstalling collected packages: sentencepiece\n",
            "Successfully installed sentencepiece-0.1.97\n"
          ]
        }
      ],
      "source": [
        "!pip install git+https://github.com/huggingface/transformers.git\n",
        "!pip install peft\n",
        "!pip install sentencepiece"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 克隆目录和代码"
      ],
      "metadata": {
        "id": "ygb1xFIMNQKw"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca\n",
        "!git clone https://github.com/ggerganov/llama.cpp"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "yCEJh7NJNXz9",
        "outputId": "5274c3c5-a0b5-4158-b5c7-647fdde9e736"
      },
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Cloning into 'Chinese-LLaMA-Alpaca'...\n",
            "remote: Enumerating objects: 479, done.\u001b[K\n",
            "remote: Counting objects: 100% (210/210), done.\u001b[K\n",
            "remote: Compressing objects: 100% (131/131), done.\u001b[K\n",
            "remote: Total 479 (delta 94), reused 138 (delta 79), pack-reused 269\u001b[K\n",
            "Receiving objects: 100% (479/479), 10.68 MiB | 25.91 MiB/s, done.\n",
            "Resolving deltas: 100% (279/279), done.\n",
            "Cloning into 'llama.cpp'...\n",
            "remote: Enumerating objects: 1503, done.\u001b[K\n",
            "remote: Counting objects: 100% (564/564), done.\u001b[K\n",
            "remote: Compressing objects: 100% (63/63), done.\u001b[K\n",
            "remote: Total 1503 (delta 520), reused 508 (delta 501), pack-reused 939\u001b[K\n",
            "Receiving objects: 100% (1503/1503), 1.79 MiB | 14.58 MiB/s, done.\n",
            "Resolving deltas: 100% (945/945), done.\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 合并模型（以Alpaca-7B为例）\n",
        "\n",
        "**⚠️ 再次提醒：7B模型需要25G内存，13B模型需要35G+内存。**\n",
        "\n",
        "此处使用的是🤗模型库中提供的基模型（已是HF格式），而不是Facebook官方的LLaMA模型，因此略去将原版LLaMA转换为HF格式的步骤。\n",
        "\n",
        "**这里直接运行第二步：合并LoRA权重**，生成全量模型权重。可以直接指定🤗模型库的地址，也可以是本地存放地址。\n",
        "- 基模型：`decapoda-research/llama-7b-hf` *（use at your own risk）*\n",
        "- LoRA模型：`ziqingyang/chinese-alpaca-lora-7b`\n",
        "\n",
        "💡 转换13B模型提示：\n",
        "- 请将参数`--base_model`和`--lora_model`中的的`7b`改为`13b`即可\n",
        "- **免费用户必须增加一个参数`--offload_dir`以缓解内存压力**，例如`--offload_dir ./offload_temp`\n",
        "\n",
        "该过程比较耗时（下载+转换），需要几分钟到十几分钟不等，请耐心等待。\n",
        "转换好的模型存放在`alpaca-combined`目录。\n",
        "如果你不需要量化模型，那么到这一步就结束了。"
      ],
      "metadata": {
        "id": "nIyxX0DSNsgQ"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!python ./Chinese-LLaMA-Alpaca/scripts/merge_llama_with_chinese_lora.py \\\n",
        "    --base_model 'decapoda-research/llama-7b-hf' \\\n",
        "    --lora_model 'ziqingyang/chinese-alpaca-lora-7b' \\\n",
        "    --output_dir alpaca-combined"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "5AV4EW5hNhVV",
        "outputId": "b0077c5c-d533-4948-9305-ebf1290143a9"
      },
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "2023-04-11 10:06:51.752004: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "Downloading tokenizer.model: 100% 758k/758k [00:00<00:00, 14.8MB/s]\n",
            "Downloading (…)cial_tokens_map.json: 100% 96.0/96.0 [00:00<00:00, 15.3kB/s]\n",
            "Downloading (…)okenizer_config.json: 100% 166/166 [00:00<00:00, 63.7kB/s]\n",
            "Downloading (…)lve/main/config.json: 100% 427/427 [00:00<00:00, 62.9kB/s]\n",
            "Downloading (…)model.bin.index.json: 100% 25.5k/25.5k [00:00<00:00, 9.22MB/s]\n",
            "Downloading shards:   0% 0/33 [00:00<?, ?it/s]\n",
            "Downloading (…)l-00001-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:   3% 10.5M/405M [00:00<00:03, 102MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 164MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  31% 126M/405M [00:00<00:01, 212MB/s] \u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  39% 157M/405M [00:00<00:01, 215MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  47% 189M/405M [00:00<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  54% 220M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  62% 252M/405M [00:01<00:00, 214MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  70% 283M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  78% 315M/405M [00:01<00:00, 217MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  85% 346M/405M [00:01<00:00, 218MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin:  93% 377M/405M [00:01<00:00, 218MB/s]\u001b[A\n",
            "Downloading (…)l-00001-of-00033.bin: 100% 405M/405M [00:01<00:00, 209MB/s]\n",
            "Downloading shards:   3% 1/33 [00:02<01:05,  2.06s/it]\n",
            "Downloading (…)l-00002-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:   3% 10.5M/405M [00:00<00:03, 99.2MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 159MB/s] \u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  31% 126M/405M [00:00<00:01, 205MB/s] \u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  36% 147M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  41% 168M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  49% 199M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  57% 231M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  62% 252M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  70% 283M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  78% 315M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  85% 346M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin:  93% 377M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00002-of-00033.bin: 100% 405M/405M [00:01<00:00, 204MB/s]\n",
            "Downloading shards:   6% 2/33 [00:04<01:06,  2.15s/it]\n",
            "Downloading (…)l-00003-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 94.1MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 154MB/s] \u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  28% 115M/405M [00:00<00:01, 201MB/s] \u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  34% 136M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  41% 168M/405M [00:00<00:01, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  49% 199M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  54% 220M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  62% 252M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  70% 283M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  78% 315M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  85% 346M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin:  93% 377M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00003-of-00033.bin: 100% 405M/405M [00:01<00:00, 202MB/s]\n",
            "Downloading shards:   9% 3/33 [00:06<01:04,  2.14s/it]\n",
            "Downloading (…)l-00004-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.1MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 142MB/s] \u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  36% 147M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  44% 178M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  52% 210M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  60% 241M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  67% 273M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  75% 304M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  83% 336M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin:  91% 367M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00004-of-00033.bin: 100% 405M/405M [00:02<00:00, 201MB/s]\n",
            "Downloading shards:  12% 4/33 [00:08<01:02,  2.14s/it]\n",
            "Downloading (…)l-00005-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 97.1MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 156MB/s] \u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  26% 105M/405M [00:00<00:01, 198MB/s] \u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  34% 136M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  41% 168M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  49% 199M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  57% 231M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  62% 252M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  70% 283M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  78% 315M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  83% 336M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  88% 357M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin:  93% 377M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00005-of-00033.bin: 100% 405M/405M [00:02<00:00, 201MB/s]\n",
            "Downloading shards:  15% 5/33 [00:10<01:00,  2.14s/it]\n",
            "Downloading (…)l-00006-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 97.1MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 153MB/s] \u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  21% 83.9M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  26% 105M/405M [00:00<00:01, 198MB/s] \u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  34% 136M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  44% 178M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  49% 199M/405M [00:01<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  57% 231M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  62% 252M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  67% 273M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  75% 304M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  83% 336M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin:  91% 367M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00006-of-00033.bin: 100% 405M/405M [00:02<00:00, 201MB/s]\n",
            "Downloading shards:  18% 6/33 [00:12<00:58,  2.15s/it]\n",
            "Downloading (…)l-00007-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 97.4MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 157MB/s] \u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  31% 126M/405M [00:00<00:01, 203MB/s] \u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  39% 157M/405M [00:00<00:01, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  47% 189M/405M [00:00<00:01, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  52% 210M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  57% 231M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  62% 252M/405M [00:01<00:00, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  67% 273M/405M [00:01<00:00, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  75% 304M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  83% 336M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin:  88% 357M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00007-of-00033.bin: 100% 405M/405M [00:02<00:00, 201MB/s]\n",
            "Downloading shards:  21% 7/33 [00:14<00:55,  2.15s/it]\n",
            "Downloading (…)l-00008-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 71.4MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 86.1MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:   8% 31.5M/405M [00:00<00:04, 85.5MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  10% 41.9M/405M [00:00<00:04, 88.6MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  13% 52.4M/405M [00:00<00:03, 91.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 94.6MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 95.8MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 97.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  23% 94.4M/405M [00:01<00:03, 96.7MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  26% 105M/405M [00:01<00:03, 98.1MB/s] \u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  28% 115M/405M [00:01<00:02, 97.1MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  31% 126M/405M [00:01<00:03, 91.3MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  34% 136M/405M [00:01<00:03, 67.1MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  36% 147M/405M [00:01<00:03, 64.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  39% 157M/405M [00:01<00:03, 68.5MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  41% 168M/405M [00:02<00:03, 75.1MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  44% 178M/405M [00:02<00:02, 78.5MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  47% 189M/405M [00:02<00:02, 83.6MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  49% 199M/405M [00:02<00:02, 87.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  52% 210M/405M [00:02<00:02, 90.8MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  54% 220M/405M [00:02<00:01, 93.7MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  57% 231M/405M [00:02<00:01, 94.7MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  60% 241M/405M [00:02<00:01, 95.5MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  62% 252M/405M [00:02<00:01, 95.1MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  65% 262M/405M [00:03<00:01, 95.2MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  67% 273M/405M [00:03<00:01, 95.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  70% 283M/405M [00:03<00:01, 96.7MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  73% 294M/405M [00:03<00:01, 97.6MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  75% 304M/405M [00:03<00:01, 97.3MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  78% 315M/405M [00:03<00:00, 98.2MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  80% 325M/405M [00:03<00:00, 98.9MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  83% 336M/405M [00:03<00:00, 97.8MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  85% 346M/405M [00:03<00:00, 97.2MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  88% 357M/405M [00:03<00:00, 98.0MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  91% 367M/405M [00:04<00:00, 98.0MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  93% 377M/405M [00:04<00:00, 98.3MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin:  96% 388M/405M [00:04<00:00, 98.7MB/s]\u001b[A\n",
            "Downloading (…)l-00008-of-00033.bin: 100% 405M/405M [00:04<00:00, 90.5MB/s]\n",
            "Downloading shards:  24% 8/33 [00:19<01:13,  2.93s/it]\n",
            "Downloading (…)l-00009-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 73.4MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 80.6MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  10% 41.9M/405M [00:00<00:03, 92.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  13% 52.4M/405M [00:00<00:03, 94.9MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 87.4MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 90.9MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 93.2MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  23% 94.4M/405M [00:01<00:03, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  26% 105M/405M [00:01<00:03, 86.0MB/s] \u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  28% 115M/405M [00:01<00:03, 86.4MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  31% 126M/405M [00:01<00:03, 89.6MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  34% 136M/405M [00:01<00:02, 90.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  36% 147M/405M [00:01<00:02, 89.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  39% 157M/405M [00:01<00:02, 91.2MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  41% 168M/405M [00:01<00:02, 93.4MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  44% 178M/405M [00:01<00:02, 87.6MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  47% 189M/405M [00:02<00:02, 88.7MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  49% 199M/405M [00:02<00:02, 88.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  52% 210M/405M [00:02<00:02, 86.6MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  54% 220M/405M [00:02<00:02, 88.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  57% 231M/405M [00:02<00:02, 85.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  60% 241M/405M [00:02<00:01, 86.2MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  62% 252M/405M [00:02<00:01, 89.8MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  65% 262M/405M [00:02<00:01, 91.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  67% 273M/405M [00:03<00:01, 93.1MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  70% 283M/405M [00:03<00:01, 94.3MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  73% 294M/405M [00:03<00:01, 90.9MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  75% 304M/405M [00:03<00:01, 92.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  78% 315M/405M [00:03<00:00, 91.3MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  80% 325M/405M [00:03<00:00, 84.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  83% 336M/405M [00:03<00:00, 84.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  85% 346M/405M [00:03<00:00, 81.7MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  88% 357M/405M [00:04<00:00, 87.0MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  91% 367M/405M [00:04<00:00, 90.7MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  93% 377M/405M [00:04<00:00, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin:  96% 388M/405M [00:04<00:00, 84.5MB/s]\u001b[A\n",
            "Downloading (…)l-00009-of-00033.bin: 100% 405M/405M [00:04<00:00, 88.8MB/s]\n",
            "Downloading shards:  27% 9/33 [00:24<01:23,  3.48s/it]\n",
            "Downloading (…)l-00010-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 95.9MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 155MB/s] \u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  36% 147M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  44% 178M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  52% 210M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  60% 241M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  67% 273M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  75% 304M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  83% 336M/405M [00:01<00:00, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  88% 357M/405M [00:01<00:00, 152MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin:  93% 377M/405M [00:02<00:00, 138MB/s]\u001b[A\n",
            "Downloading (…)l-00010-of-00033.bin: 100% 405M/405M [00:02<00:00, 166MB/s]\n",
            "Downloading shards:  30% 10/33 [00:26<01:13,  3.20s/it]\n",
            "Downloading (…)l-00011-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 70.8MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 87.3MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  10% 41.9M/405M [00:00<00:03, 101MB/s] \u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 106MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  26% 105M/405M [00:01<00:02, 102MB/s] \u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  31% 126M/405M [00:01<00:02, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  34% 136M/405M [00:01<00:02, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  39% 157M/405M [00:01<00:02, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  41% 168M/405M [00:01<00:02, 100MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  47% 189M/405M [00:01<00:02, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  52% 210M/405M [00:02<00:01, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  54% 220M/405M [00:02<00:01, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  60% 241M/405M [00:02<00:01, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  65% 262M/405M [00:02<00:01, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  70% 283M/405M [00:02<00:01, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  73% 294M/405M [00:02<00:01, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  75% 304M/405M [00:02<00:00, 101MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  80% 325M/405M [00:03<00:00, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  83% 336M/405M [00:03<00:00, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  88% 357M/405M [00:03<00:00, 103MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  91% 367M/405M [00:03<00:00, 102MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin:  96% 388M/405M [00:03<00:00, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00011-of-00033.bin: 100% 405M/405M [00:03<00:00, 103MB/s]\n",
            "Downloading shards:  33% 11/33 [00:30<01:16,  3.47s/it]\n",
            "Downloading (…)l-00012-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 70.1MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 81.1MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:   8% 31.5M/405M [00:00<00:04, 83.1MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  10% 41.9M/405M [00:00<00:04, 88.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  13% 52.4M/405M [00:00<00:03, 90.8MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 94.5MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 93.6MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  23% 94.4M/405M [00:01<00:03, 94.1MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  26% 105M/405M [00:01<00:03, 93.9MB/s] \u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  28% 115M/405M [00:01<00:03, 93.9MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  31% 126M/405M [00:01<00:03, 91.6MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  34% 136M/405M [00:01<00:02, 93.5MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  36% 147M/405M [00:01<00:02, 94.6MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  39% 157M/405M [00:01<00:02, 93.9MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  41% 168M/405M [00:01<00:02, 93.3MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  44% 178M/405M [00:01<00:02, 94.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  47% 189M/405M [00:02<00:02, 93.2MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  49% 199M/405M [00:02<00:02, 94.0MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  52% 210M/405M [00:02<00:02, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  54% 220M/405M [00:02<00:01, 92.6MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  57% 231M/405M [00:02<00:01, 91.0MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  60% 241M/405M [00:02<00:01, 90.9MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  62% 252M/405M [00:02<00:01, 92.5MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  65% 262M/405M [00:02<00:01, 93.2MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  67% 273M/405M [00:02<00:01, 93.3MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  70% 283M/405M [00:03<00:01, 94.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  73% 294M/405M [00:03<00:01, 93.8MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  75% 304M/405M [00:03<00:01, 93.0MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  78% 315M/405M [00:03<00:00, 93.0MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  80% 325M/405M [00:03<00:00, 89.9MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  83% 336M/405M [00:03<00:00, 91.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  85% 346M/405M [00:03<00:00, 92.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  88% 357M/405M [00:03<00:00, 93.7MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  91% 367M/405M [00:03<00:00, 93.8MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  93% 377M/405M [00:04<00:00, 94.3MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin:  96% 388M/405M [00:04<00:00, 95.4MB/s]\u001b[A\n",
            "Downloading (…)l-00012-of-00033.bin: 100% 405M/405M [00:04<00:00, 92.2MB/s]\n",
            "Downloading shards:  36% 12/33 [00:35<01:19,  3.81s/it]\n",
            "Downloading (…)l-00013-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 79.6MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:   5% 21.0M/405M [00:00<00:04, 91.8MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  10% 41.9M/405M [00:00<00:03, 102MB/s] \u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  16% 62.9M/405M [00:00<00:03, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  18% 73.4M/405M [00:00<00:03, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  21% 83.9M/405M [00:00<00:03, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  26% 105M/405M [00:01<00:02, 105MB/s] \u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  31% 126M/405M [00:01<00:02, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  36% 147M/405M [00:01<00:02, 107MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  41% 168M/405M [00:01<00:02, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  47% 189M/405M [00:01<00:02, 106MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  52% 210M/405M [00:02<00:01, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  57% 231M/405M [00:02<00:01, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  62% 252M/405M [00:02<00:01, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  67% 273M/405M [00:02<00:01, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  73% 294M/405M [00:02<00:01, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  78% 315M/405M [00:03<00:00, 104MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  83% 336M/405M [00:03<00:00, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  88% 357M/405M [00:03<00:00, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin:  93% 377M/405M [00:03<00:00, 105MB/s]\u001b[A\n",
            "Downloading (…)l-00013-of-00033.bin: 100% 405M/405M [00:03<00:00, 104MB/s]\n",
            "Downloading shards:  39% 13/33 [00:39<01:17,  3.87s/it]\n",
            "Downloading (…)l-00014-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 97.7MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 158MB/s] \u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  31% 126M/405M [00:00<00:01, 206MB/s] \u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  39% 157M/405M [00:00<00:01, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  47% 189M/405M [00:00<00:01, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  54% 220M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  62% 252M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  70% 283M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  78% 315M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  83% 336M/405M [00:01<00:00, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  88% 357M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin:  93% 377M/405M [00:01<00:00, 163MB/s]\u001b[A\n",
            "Downloading (…)l-00014-of-00033.bin: 100% 405M/405M [00:02<00:00, 184MB/s]\n",
            "Downloading shards:  42% 14/33 [00:41<01:04,  3.41s/it]\n",
            "Downloading (…)l-00015-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 92.3MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 152MB/s] \u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 172MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  21% 83.9M/405M [00:00<00:01, 191MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  26% 105M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  31% 126M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  39% 157M/405M [00:00<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  47% 189M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  54% 220M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  62% 252M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  70% 283M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  75% 304M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  80% 325M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin:  88% 357M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00015-of-00033.bin: 100% 405M/405M [00:02<00:00, 201MB/s]\n",
            "Downloading shards:  45% 15/33 [00:44<00:54,  3.03s/it]\n",
            "Downloading (…)l-00016-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 97.1MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 156MB/s] \u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  31% 126M/405M [00:00<00:01, 204MB/s] \u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  39% 157M/405M [00:00<00:01, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  47% 189M/405M [00:00<00:01, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  54% 220M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  62% 252M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  67% 273M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  73% 294M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  80% 325M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin:  88% 357M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00016-of-00033.bin: 100% 405M/405M [00:02<00:00, 202MB/s]\n",
            "Downloading shards:  48% 16/33 [00:46<00:46,  2.76s/it]\n",
            "Downloading (…)l-00017-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:   3% 10.5M/405M [00:00<00:03, 102MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 159MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  21% 83.9M/405M [00:00<00:01, 195MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  28% 115M/405M [00:00<00:01, 203MB/s] \u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  36% 147M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  41% 168M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  47% 189M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  54% 220M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  62% 252M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  70% 283M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  78% 315M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  85% 346M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin:  93% 377M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00017-of-00033.bin: 100% 405M/405M [00:02<00:00, 202MB/s]\n",
            "Downloading shards:  52% 17/33 [00:48<00:41,  2.58s/it]\n",
            "Downloading (…)l-00018-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 76.1MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 127MB/s] \u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  16% 62.9M/405M [00:00<00:02, 170MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  28% 115M/405M [00:00<00:01, 195MB/s] \u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  36% 147M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  44% 178M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  52% 210M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  60% 241M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  67% 273M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  75% 304M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  83% 336M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin:  91% 367M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00018-of-00033.bin: 100% 405M/405M [00:02<00:00, 199MB/s]\n",
            "Downloading shards:  55% 18/33 [00:50<00:36,  2.45s/it]\n",
            "Downloading (…)l-00019-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 91.3MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 150MB/s] \u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  28% 115M/405M [00:00<00:01, 197MB/s] \u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  34% 136M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  39% 157M/405M [00:00<00:01, 197MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  44% 178M/405M [00:00<00:01, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  49% 199M/405M [00:01<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  54% 220M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  65% 262M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  70% 283M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  75% 304M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  80% 325M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin:  91% 367M/405M [00:01<00:00, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00019-of-00033.bin: 100% 405M/405M [00:02<00:00, 194MB/s]\n",
            "Downloading shards:  58% 19/33 [00:52<00:33,  2.38s/it]\n",
            "Downloading (…)l-00020-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 87.8MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 140MB/s] \u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  31% 126M/405M [00:00<00:01, 200MB/s] \u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  39% 157M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  44% 178M/405M [00:00<00:01, 205MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  52% 210M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  60% 241M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  65% 262M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  70% 283M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  78% 315M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  85% 346M/405M [00:01<00:00, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin:  93% 377M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00020-of-00033.bin: 100% 405M/405M [00:02<00:00, 200MB/s]\n",
            "Downloading shards:  61% 20/33 [00:54<00:30,  2.32s/it]\n",
            "Downloading (…)l-00021-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 79.5MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 131MB/s] \u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 161MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  26% 105M/405M [00:00<00:01, 194MB/s] \u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  34% 136M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  41% 168M/405M [00:00<00:01, 204MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  49% 199M/405M [00:01<00:01, 203MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  57% 231M/405M [00:01<00:00, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  65% 262M/405M [00:01<00:00, 209MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  73% 294M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  80% 325M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin:  88% 357M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00021-of-00033.bin: 100% 405M/405M [00:02<00:00, 197MB/s]\n",
            "Downloading shards:  64% 21/33 [00:57<00:27,  2.28s/it]\n",
            "Downloading (…)l-00022-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 98.0MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 158MB/s] \u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  13% 52.4M/405M [00:00<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 188MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  26% 105M/405M [00:00<00:01, 199MB/s] \u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  31% 126M/405M [00:00<00:01, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  39% 157M/405M [00:00<00:01, 206MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  47% 189M/405M [00:00<00:01, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  54% 220M/405M [00:01<00:00, 210MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  62% 252M/405M [00:01<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  70% 283M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  78% 315M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  85% 346M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin:  91% 367M/405M [00:01<00:00, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00022-of-00033.bin: 100% 405M/405M [00:02<00:00, 202MB/s]\n",
            "Downloading shards:  67% 22/33 [00:59<00:24,  2.24s/it]\n",
            "Downloading (…)l-00023-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 85.4MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:   8% 31.5M/405M [00:00<00:03, 121MB/s] \u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 152MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 170MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  28% 115M/405M [00:00<00:01, 186MB/s] \u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  34% 136M/405M [00:00<00:01, 189MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  39% 157M/405M [00:00<00:01, 193MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  44% 178M/405M [00:01<00:01, 196MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  49% 199M/405M [00:01<00:01, 198MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  54% 220M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  60% 241M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  65% 262M/405M [00:01<00:00, 200MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  70% 283M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  75% 304M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  80% 325M/405M [00:01<00:00, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  85% 346M/405M [00:01<00:00, 202MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin:  91% 367M/405M [00:01<00:00, 199MB/s]\u001b[A\n",
            "Downloading (…)l-00023-of-00033.bin: 100% 405M/405M [00:02<00:00, 188MB/s]\n",
            "Downloading shards:  70% 23/33 [01:01<00:22,  2.25s/it]\n",
            "Downloading (…)l-00024-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:   3% 10.5M/405M [00:00<00:03, 98.9MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 159MB/s] \u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  16% 62.9M/405M [00:00<00:01, 190MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 201MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  31% 126M/405M [00:00<00:01, 207MB/s] \u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  36% 147M/405M [00:00<00:01, 207MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  41% 168M/405M [00:00<00:01, 208MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  49% 199M/405M [00:00<00:00, 211MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  57% 231M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  65% 262M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  73% 294M/405M [00:01<00:00, 216MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  80% 325M/405M [00:01<00:00, 213MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin:  88% 357M/405M [00:01<00:00, 212MB/s]\u001b[A\n",
            "Downloading (…)l-00024-of-00033.bin: 100% 405M/405M [00:01<00:00, 206MB/s]\n",
            "Downloading shards:  73% 24/33 [01:03<00:20,  2.23s/it]\n",
            "Downloading (…)l-00025-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.4MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 155MB/s] \u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 169MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  28% 115M/405M [00:00<00:01, 171MB/s] \u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  34% 136M/405M [00:00<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  39% 157M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  44% 178M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  49% 199M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  54% 220M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  60% 241M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  65% 262M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  70% 283M/405M [00:01<00:00, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  75% 304M/405M [00:01<00:00, 185MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  80% 325M/405M [00:01<00:00, 184MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  85% 346M/405M [00:01<00:00, 183MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin:  91% 367M/405M [00:02<00:00, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00025-of-00033.bin: 100% 405M/405M [00:02<00:00, 176MB/s]\n",
            "Downloading shards:  76% 25/33 [01:06<00:18,  2.29s/it]\n",
            "Downloading (…)l-00026-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 76.3MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:   8% 31.5M/405M [00:00<00:03, 123MB/s] \u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 145MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 162MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  28% 115M/405M [00:00<00:01, 175MB/s] \u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  34% 136M/405M [00:00<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  39% 157M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  44% 178M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  49% 199M/405M [00:01<00:01, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  54% 220M/405M [00:01<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  60% 241M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  65% 262M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  70% 283M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  75% 304M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  80% 325M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  85% 346M/405M [00:02<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin:  91% 367M/405M [00:02<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00026-of-00033.bin: 100% 405M/405M [00:02<00:00, 172MB/s]\n",
            "Downloading shards:  79% 26/33 [01:08<00:16,  2.35s/it]\n",
            "Downloading (…)l-00027-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 76.4MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:   8% 31.5M/405M [00:00<00:03, 120MB/s] \u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 145MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 157MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 162MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  28% 115M/405M [00:00<00:01, 166MB/s] \u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  34% 136M/405M [00:00<00:01, 170MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  39% 157M/405M [00:00<00:01, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  44% 178M/405M [00:01<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  49% 199M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  54% 220M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  60% 241M/405M [00:01<00:00, 186MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  65% 262M/405M [00:01<00:00, 187MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  70% 283M/405M [00:01<00:00, 184MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  75% 304M/405M [00:01<00:00, 183MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  80% 325M/405M [00:01<00:00, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  85% 346M/405M [00:02<00:00, 182MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin:  91% 367M/405M [00:02<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00027-of-00033.bin: 100% 405M/405M [00:02<00:00, 171MB/s]\n",
            "Downloading shards:  82% 27/33 [01:11<00:14,  2.40s/it]\n",
            "Downloading (…)l-00028-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 94.8MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 145MB/s] \u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 161MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  18% 73.4M/405M [00:00<00:01, 170MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  28% 115M/405M [00:00<00:01, 177MB/s] \u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  34% 136M/405M [00:00<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  39% 157M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  44% 178M/405M [00:01<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  49% 199M/405M [00:01<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  54% 220M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  60% 241M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  65% 262M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  70% 283M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  75% 304M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  80% 325M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  85% 346M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin:  91% 367M/405M [00:02<00:00, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00028-of-00033.bin: 100% 405M/405M [00:02<00:00, 173MB/s]\n",
            "Downloading shards:  85% 28/33 [01:13<00:12,  2.42s/it]\n",
            "Downloading (…)l-00029-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 96.9MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 145MB/s] \u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 159MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 163MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 161MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  28% 115M/405M [00:00<00:01, 167MB/s] \u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  34% 136M/405M [00:00<00:01, 171MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  39% 157M/405M [00:00<00:01, 173MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  44% 178M/405M [00:01<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  49% 199M/405M [00:01<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  54% 220M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  60% 241M/405M [00:01<00:00, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  65% 262M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  70% 283M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  75% 304M/405M [00:01<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  80% 325M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  85% 346M/405M [00:02<00:00, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  91% 367M/405M [00:02<00:00, 139MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin:  96% 388M/405M [00:02<00:00, 149MB/s]\u001b[A\n",
            "Downloading (…)l-00029-of-00033.bin: 100% 405M/405M [00:02<00:00, 164MB/s]\n",
            "Downloading shards:  88% 29/33 [01:16<00:09,  2.48s/it]\n",
            "Downloading (…)l-00030-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 93.3MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 134MB/s] \u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 154MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 163MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  28% 115M/405M [00:00<00:01, 183MB/s] \u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  34% 136M/405M [00:00<00:01, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  39% 157M/405M [00:00<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  44% 178M/405M [00:01<00:01, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  49% 199M/405M [00:01<00:01, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  54% 220M/405M [00:01<00:01, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  60% 241M/405M [00:01<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  65% 262M/405M [00:01<00:00, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  70% 283M/405M [00:01<00:00, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  75% 304M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  80% 325M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  85% 346M/405M [00:02<00:00, 166MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  91% 367M/405M [00:02<00:00, 168MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin:  96% 388M/405M [00:02<00:00, 165MB/s]\u001b[A\n",
            "Downloading (…)l-00030-of-00033.bin: 100% 405M/405M [00:02<00:00, 169MB/s]\n",
            "Downloading shards:  91% 30/33 [01:18<00:07,  2.49s/it]\n",
            "Downloading (…)l-00031-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:   3% 10.5M/405M [00:00<00:04, 91.9MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:   8% 31.5M/405M [00:00<00:02, 133MB/s] \u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 146MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 158MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 166MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  28% 115M/405M [00:00<00:01, 170MB/s] \u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  34% 136M/405M [00:00<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  39% 157M/405M [00:00<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  44% 178M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  49% 199M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  54% 220M/405M [00:01<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  60% 241M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  65% 262M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  70% 283M/405M [00:01<00:00, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  75% 304M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  80% 325M/405M [00:01<00:00, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  85% 346M/405M [00:02<00:00, 162MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin:  91% 367M/405M [00:02<00:00, 165MB/s]\u001b[A\n",
            "Downloading (…)l-00031-of-00033.bin: 100% 405M/405M [00:02<00:00, 167MB/s]\n",
            "Downloading shards:  94% 31/33 [01:21<00:05,  2.52s/it]\n",
            "Downloading (…)l-00032-of-00033.bin:   0% 0.00/405M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:   3% 10.5M/405M [00:00<00:05, 75.8MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:   8% 31.5M/405M [00:00<00:03, 123MB/s] \u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  13% 52.4M/405M [00:00<00:02, 147MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  18% 73.4M/405M [00:00<00:02, 158MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  23% 94.4M/405M [00:00<00:01, 166MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  28% 115M/405M [00:00<00:01, 171MB/s] \u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  34% 136M/405M [00:00<00:01, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  39% 157M/405M [00:00<00:01, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  44% 178M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  49% 199M/405M [00:01<00:01, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  54% 220M/405M [00:01<00:01, 178MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  60% 241M/405M [00:01<00:00, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  65% 262M/405M [00:01<00:00, 175MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  70% 283M/405M [00:01<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  75% 304M/405M [00:01<00:00, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  80% 325M/405M [00:01<00:00, 177MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  85% 346M/405M [00:02<00:00, 176MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin:  91% 367M/405M [00:02<00:00, 180MB/s]\u001b[A\n",
            "Downloading (…)l-00032-of-00033.bin: 100% 405M/405M [00:02<00:00, 171MB/s]\n",
            "Downloading shards:  97% 32/33 [01:23<00:02,  2.51s/it]\n",
            "Downloading (…)l-00033-of-00033.bin:   0% 0.00/524M [00:00<?, ?B/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:   2% 10.5M/524M [00:00<00:05, 96.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:   6% 31.5M/524M [00:00<00:03, 155MB/s] \u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  10% 52.4M/524M [00:00<00:02, 179MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  14% 73.4M/524M [00:01<00:16, 27.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  18% 94.4M/524M [00:02<00:10, 39.2MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  22% 115M/524M [00:03<00:17, 22.8MB/s] \u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  26% 136M/524M [00:03<00:12, 32.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  30% 157M/524M [00:05<00:16, 22.1MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  34% 178M/524M [00:05<00:11, 30.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  36% 189M/524M [00:07<00:17, 18.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  38% 199M/524M [00:07<00:14, 22.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  42% 220M/524M [00:07<00:09, 33.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  44% 231M/524M [00:08<00:16, 18.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  46% 241M/524M [00:08<00:12, 22.3MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  50% 262M/524M [00:09<00:07, 33.8MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  54% 283M/524M [00:09<00:05, 48.0MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  58% 304M/524M [00:09<00:03, 64.4MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  62% 325M/524M [00:09<00:02, 82.6MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  66% 346M/524M [00:09<00:01, 102MB/s] \u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  70% 367M/524M [00:09<00:01, 120MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  74% 388M/524M [00:09<00:00, 137MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  78% 409M/524M [00:09<00:00, 151MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  82% 430M/524M [00:09<00:00, 165MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  86% 451M/524M [00:10<00:00, 174MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  90% 472M/524M [00:10<00:00, 181MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin:  96% 503M/524M [00:10<00:00, 192MB/s]\u001b[A\n",
            "Downloading (…)l-00033-of-00033.bin: 100% 524M/524M [00:10<00:00, 50.1MB/s]\n",
            "Downloading shards: 100% 33/33 [01:34<00:00,  2.86s/it]\n",
            "Loading checkpoint shards: 100% 33/33 [00:15<00:00,  2.16it/s]\n",
            "Downloading (…)neration_config.json: 100% 124/124 [00:00<00:00, 18.4kB/s]\n",
            "Extended vocabulary size: 49954\n",
            "Loading LoRA for 7B model\n",
            "Downloading (…)/adapter_config.json: 100% 472/472 [00:00<00:00, 170kB/s]\n",
            "Downloading adapter_model.bin: 100% 858M/858M [00:09<00:00, 94.8MB/s]\n",
            "Peft version: 0.2.0\n",
            "Merging model\n",
            "Saving shard 1 of 1 into alpaca-combined/consolidated.00.pth\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## 量化模型\n",
        "接下来我们使用[llama.cpp](https://github.com/ggerganov/llama.cpp)工具对上一步生成的全量版本权重进行转换，生成4-bit量化模型。\n",
        "\n",
        "### 编译工具\n",
        "\n",
        "首先对llama.cpp工具进行编译。"
      ],
      "metadata": {
        "id": "ueexcKo-Q_EW"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && make"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_GbjsT2wRRCR",
        "outputId": "116fb0e3-ea38-442c-bf7b-531349d43dc1"
      },
      "execution_count": 4,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "I llama.cpp build info: \n",
            "I UNAME_S:  Linux\n",
            "I UNAME_P:  x86_64\n",
            "I UNAME_M:  x86_64\n",
            "I CFLAGS:   -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native\n",
            "I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native\n",
            "I LDFLAGS:  \n",
            "I CC:       cc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
            "I CXX:      g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0\n",
            "\n",
            "cc  -I.              -O3 -DNDEBUG -std=c11   -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -Wno-unused-function -pthread -march=native -mtune=native   -c ggml.c -o ggml.o\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c llama.cpp -o llama.o\n",
            "In file included from \u001b[01m\u001b[Kllama.cpp:1\u001b[m\u001b[K:\n",
            "\u001b[01m\u001b[Kllama_util.h:58:2:\u001b[m\u001b[K \u001b[01;35m\u001b[Kwarning: \u001b[m\u001b[Kextra ‘\u001b[01m\u001b[K;\u001b[m\u001b[K’ [\u001b[01;35m\u001b[K-Wpedantic\u001b[m\u001b[K]\n",
            "   58 | }\u001b[01;35m\u001b[K;\u001b[m\u001b[K\n",
            "      |  \u001b[01;35m\u001b[K^\u001b[m\u001b[K\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -c examples/common.cpp -o common.o\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/main/main.cpp ggml.o llama.o common.o -o main \n",
            "\n",
            "====  Run ./main -h for help.  ====\n",
            "\n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/quantize/quantize.cpp ggml.o llama.o -o quantize \n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/perplexity/perplexity.cpp ggml.o llama.o common.o -o perplexity \n",
            "g++ -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native examples/embedding/embedding.cpp ggml.o llama.o common.o -o embedding \n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### 模型转换为ggml格式（FP16）\n",
        "\n",
        "这一步，我们将模型转换为ggml格式（FP16）。\n",
        "- 在这之前需要把`alpaca-combined`目录挪个位置，并且保证符合转换脚本的要求。\n",
        "- tokenizer文件需要在模型文件的父节点上（注意这里使用的是中文Alpaca模型附带的文件，而不是合并模型步骤转换出来的）。\n",
        "- 这里我们直接从 https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model 下载中文Alpaca-7B的tokenizer.model文件。\n",
        "\n",
        "💡 转换13B模型提示：\n",
        "- tokenizer可以直接用7B的，13B和7B的相同\n",
        "- llama和alpaca的tokenizer不可混用\n",
        "- 以下看到7B字样的都是文件夹名，与转换过程没有关系了，改不改都行"
      ],
      "metadata": {
        "id": "gw2xpYC0RcQC"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && mkdir zh-models && mv ../alpaca-combined zh-models/7B"
      ],
      "metadata": {
        "id": "5KgnFVStRjio"
      },
      "execution_count": 5,
      "outputs": []
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp/zh-models && wget https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Cl2E2WBPSnmw",
        "outputId": "56bb0025-bf1f-44fb-b899-b5cde02ad56d"
      },
      "execution_count": 6,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "--2023-04-11 10:12:13--  https://huggingface.co/ziqingyang/chinese-alpaca-lora-7b/resolve/main/tokenizer.model\n",
            "Resolving huggingface.co (huggingface.co)... 13.249.85.92, 13.249.85.69, 13.249.85.16, ...\n",
            "Connecting to huggingface.co (huggingface.co)|13.249.85.92|:443... connected.\n",
            "HTTP request sent, awaiting response... 302 Found\n",
            "Location: https://cdn-lfs.huggingface.co/repos/0f/01/0f01544c04c27e0a0357540e7be5763000a215cedb3be4a0356b56983f2fd5e3/2d967e855b1213a439df6c8ce2791f869c84b4f3b6cfacf22b86440b8192a2f8?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tokenizer.model%3B+filename%3D%22tokenizer.model%22%3B&Expires=1681466367&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBmLzAxLzBmMDE1NDRjMDRjMjdlMGEwMzU3NTQwZTdiZTU3NjMwMDBhMjE1Y2VkYjNiZTRhMDM1NmI1Njk4M2YyZmQ1ZTMvMmQ5NjdlODU1YjEyMTNhNDM5ZGY2YzhjZTI3OTFmODY5Yzg0YjRmM2I2Y2ZhY2YyMmI4NjQ0MGI4MTkyYTJmOD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODE0NjYzNjd9fX1dfQ__&Signature=ZYXopvfhgU9alvOYAGiIlHW1HZLHNeTff2IqXQDMEdVq%7EH98S1AoYFNrlRnTdH0wKPagUtTNIOIs9Gv9ufNyGEiwxnqzgOFKMyj-yy1OoDsF%7Em06L8Mz42YGfGk-tdCBkUZvXo%7ERGHS8-GVldEW9lVi17fQQ1EI07RoANnD1aEZUi7r6SCG9gySAxQIxPl5vMqNCYlUOGSSUpdEUqPYT0xXHQM2oWT95Ig9MKZLp44jTYTalvVRweR9I34cCfS6FD6EiDmCuBq4OkOLiOOQ99UysVRfTw2NmC04bFsP0HJQPb4bmIz9QVkc643fRDVp4-dK5PMkRXsYMexoVYOOHaA__&Key-Pair-Id=KVTP0A1DKRTAX [following]\n",
            "--2023-04-11 10:12:14--  https://cdn-lfs.huggingface.co/repos/0f/01/0f01544c04c27e0a0357540e7be5763000a215cedb3be4a0356b56983f2fd5e3/2d967e855b1213a439df6c8ce2791f869c84b4f3b6cfacf22b86440b8192a2f8?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tokenizer.model%3B+filename%3D%22tokenizer.model%22%3B&Expires=1681466367&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zLzBmLzAxLzBmMDE1NDRjMDRjMjdlMGEwMzU3NTQwZTdiZTU3NjMwMDBhMjE1Y2VkYjNiZTRhMDM1NmI1Njk4M2YyZmQ1ZTMvMmQ5NjdlODU1YjEyMTNhNDM5ZGY2YzhjZTI3OTFmODY5Yzg0YjRmM2I2Y2ZhY2YyMmI4NjQ0MGI4MTkyYTJmOD9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODE0NjYzNjd9fX1dfQ__&Signature=ZYXopvfhgU9alvOYAGiIlHW1HZLHNeTff2IqXQDMEdVq%7EH98S1AoYFNrlRnTdH0wKPagUtTNIOIs9Gv9ufNyGEiwxnqzgOFKMyj-yy1OoDsF%7Em06L8Mz42YGfGk-tdCBkUZvXo%7ERGHS8-GVldEW9lVi17fQQ1EI07RoANnD1aEZUi7r6SCG9gySAxQIxPl5vMqNCYlUOGSSUpdEUqPYT0xXHQM2oWT95Ig9MKZLp44jTYTalvVRweR9I34cCfS6FD6EiDmCuBq4OkOLiOOQ99UysVRfTw2NmC04bFsP0HJQPb4bmIz9QVkc643fRDVp4-dK5PMkRXsYMexoVYOOHaA__&Key-Pair-Id=KVTP0A1DKRTAX\n",
            "Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 13.249.85.116, 13.249.85.11, 13.249.85.23, ...\n",
            "Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|13.249.85.116|:443... connected.\n",
            "HTTP request sent, awaiting response... 200 OK\n",
            "Length: 757972 (740K) [binary/octet-stream]\n",
            "Saving to: ‘tokenizer.model’\n",
            "\n",
            "tokenizer.model     100%[===================>] 740.21K  --.-KB/s    in 0.05s   \n",
            "\n",
            "2023-04-11 10:12:14 (14.7 MB/s) - ‘tokenizer.model’ saved [757972/757972]\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && python convert-pth-to-ggml.py zh-models/7B/ 1"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "NUHeoTMQS1AQ",
        "outputId": "ac525063-b595-417c-e11c-225278701023"
      },
      "execution_count": 7,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{'dim': 4096, 'multiple_of': 256, 'n_heads': 32, 'n_layers': 32, 'norm_eps': 1e-06, 'vocab_size': -1}\n",
            "Namespace(dir_model='zh-models/7B/', ftype=1, vocab_only=0)\n",
            "n_parts = 1\n",
            "\n",
            "Processing part 1 of 1\n",
            "\n",
            "Processing variable: tok_embeddings.weight with shape: (49954, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.0.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.0.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.0.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.1.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.1.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.1.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.1.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.2.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.2.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.2.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.2.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.3.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.3.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.3.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.3.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.4.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.4.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.4.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.4.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.5.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.5.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.5.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.5.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.6.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.6.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.6.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.6.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.7.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.7.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.7.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.7.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.8.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.8.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.8.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.8.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.9.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.9.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.9.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.9.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.10.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.10.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.10.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.10.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.11.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.11.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.11.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.11.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.12.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.12.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.12.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.12.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.13.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.13.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.13.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.13.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.14.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.14.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.14.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.14.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.15.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.15.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.15.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.15.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.16.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.16.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.16.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.16.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.17.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.17.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.17.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.17.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.18.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.18.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.18.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.18.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.19.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.19.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.19.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.19.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.20.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.20.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.20.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.20.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.21.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.21.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.21.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.21.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.22.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.22.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.22.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.22.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.23.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.23.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.23.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.23.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.24.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.24.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.24.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.24.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.25.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.25.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.25.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.25.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.26.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.26.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.26.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.26.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.27.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.27.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.27.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.27.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.28.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.28.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.28.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.28.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.29.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.29.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.29.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.29.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.30.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.30.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.30.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.30.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.31.attention.wq.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.attention.wk.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.attention.wv.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.attention.wo.weight with shape: (4096, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.feed_forward.w1.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.feed_forward.w2.weight with shape: (4096, 11008) and type: torch.float16\n",
            "Processing variable: layers.31.feed_forward.w3.weight with shape: (11008, 4096) and type: torch.float16\n",
            "Processing variable: layers.31.attention_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: layers.31.ffn_norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: norm.weight with shape: (4096,) and type: torch.float16\n",
            "  Converting to float32\n",
            "Processing variable: output.weight with shape: (49954, 4096) and type: torch.float16\n",
            "Done. Output file: zh-models/7B//ggml-model-f16.bin\n",
            "\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### 将FP16模型量化为4-bit\n",
        "\n",
        "**⚠️ 本步骤消耗内存峰值为<font size=\"5\">3-4G</font>左右，运行前务必确认是否有足够空闲内存！**\n",
        "\n",
        "我们进一步将FP16模型转换为4-bit量化模型。"
      ],
      "metadata": {
        "id": "hEZEJAVYCHkc"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && ./quantize ./zh-models/7B/ggml-model-f16.bin ./zh-models/7B/ggml-model-q4_0.bin 2"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "2xyais7OUVDI",
        "outputId": "f906545e-4713-49b5-a462-e6d0c0039f12"
      },
      "execution_count": 8,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "llama.cpp: loading model from ./zh-models/7B/ggml-model-f16.bin\n",
            "llama.cpp: saving model to ./zh-models/7B/ggml-model-q4_0.bin\n",
            "[1/291]                tok_embeddings.weight - [4096 x 49954], type =    f16, quantizing .. size =   390.27 MB ->   121.96 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[2/291]         layers.0.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.103 0.137 0.158 0.137 0.103 0.071 0.046 0.028 0.016 0.021 \n",
            "[3/291]         layers.0.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.027 0.045 0.071 0.104 0.138 0.158 0.139 0.104 0.071 0.045 0.027 0.016 0.021 \n",
            "[4/291]         layers.0.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.103 0.128 0.141 0.128 0.103 0.075 0.051 0.032 0.019 0.022 \n",
            "[5/291]         layers.0.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.072 0.105 0.136 0.151 0.136 0.105 0.072 0.046 0.028 0.016 0.021 \n",
            "[6/291]      layers.0.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[7/291]      layers.0.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[8/291]      layers.0.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[9/291]       layers.0.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[10/291]             layers.0.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[11/291]         layers.1.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.077 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
            "[12/291]         layers.1.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.018 0.032 0.051 0.076 0.104 0.128 0.138 0.128 0.104 0.077 0.051 0.032 0.018 0.022 \n",
            "[13/291]         layers.1.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.018 0.031 0.051 0.076 0.104 0.129 0.139 0.129 0.104 0.076 0.051 0.031 0.018 0.021 \n",
            "[14/291]         layers.1.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.021 0.016 0.028 0.046 0.071 0.104 0.137 0.154 0.137 0.104 0.071 0.046 0.028 0.016 0.021 \n",
            "[15/291]      layers.1.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[16/291]      layers.1.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[17/291]      layers.1.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[18/291]       layers.1.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[19/291]             layers.1.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[20/291]         layers.2.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[21/291]         layers.2.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.051 0.076 0.104 0.127 0.137 0.127 0.104 0.077 0.051 0.032 0.019 0.022 \n",
            "[22/291]         layers.2.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.136 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[23/291]         layers.2.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[24/291]      layers.2.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[25/291]      layers.2.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[26/291]      layers.2.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[27/291]       layers.2.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[28/291]             layers.2.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[29/291]         layers.3.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[30/291]         layers.3.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.136 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[31/291]         layers.3.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[32/291]         layers.3.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[33/291]      layers.3.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[34/291]      layers.3.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[35/291]      layers.3.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[36/291]       layers.3.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[37/291]             layers.3.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[38/291]         layers.4.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[39/291]         layers.4.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[40/291]         layers.4.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[41/291]         layers.4.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[42/291]      layers.4.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[43/291]      layers.4.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[44/291]      layers.4.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[45/291]       layers.4.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[46/291]             layers.4.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[47/291]         layers.5.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[48/291]         layers.5.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[49/291]         layers.5.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[50/291]         layers.5.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[51/291]      layers.5.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[52/291]      layers.5.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[53/291]      layers.5.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[54/291]       layers.5.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[55/291]             layers.5.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[56/291]         layers.6.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[57/291]         layers.6.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[58/291]         layers.6.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[59/291]         layers.6.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[60/291]      layers.6.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[61/291]      layers.6.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[62/291]      layers.6.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[63/291]       layers.6.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[64/291]             layers.6.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[65/291]         layers.7.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[66/291]         layers.7.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[67/291]         layers.7.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[68/291]         layers.7.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[69/291]      layers.7.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[70/291]      layers.7.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[71/291]      layers.7.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[72/291]       layers.7.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[73/291]             layers.7.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[74/291]         layers.8.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[75/291]         layers.8.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[76/291]         layers.8.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[77/291]         layers.8.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[78/291]      layers.8.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[79/291]      layers.8.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[80/291]      layers.8.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[81/291]       layers.8.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[82/291]             layers.8.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[83/291]         layers.9.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[84/291]         layers.9.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[85/291]         layers.9.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[86/291]         layers.9.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[87/291]      layers.9.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[88/291]      layers.9.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[89/291]      layers.9.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[90/291]       layers.9.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[91/291]             layers.9.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[92/291]        layers.10.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[93/291]        layers.10.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[94/291]        layers.10.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[95/291]        layers.10.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[96/291]     layers.10.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[97/291]     layers.10.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[98/291]     layers.10.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[99/291]      layers.10.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[100/291]            layers.10.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[101/291]        layers.11.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[102/291]        layers.11.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[103/291]        layers.11.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[104/291]        layers.11.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[105/291]     layers.11.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[106/291]     layers.11.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[107/291]     layers.11.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[108/291]      layers.11.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[109/291]            layers.11.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[110/291]        layers.12.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[111/291]        layers.12.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[112/291]        layers.12.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[113/291]        layers.12.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[114/291]     layers.12.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[115/291]     layers.12.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[116/291]     layers.12.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[117/291]      layers.12.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[118/291]            layers.12.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[119/291]        layers.13.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[120/291]        layers.13.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[121/291]        layers.13.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[122/291]        layers.13.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[123/291]     layers.13.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[124/291]     layers.13.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[125/291]     layers.13.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[126/291]      layers.13.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[127/291]            layers.13.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[128/291]        layers.14.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[129/291]        layers.14.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[130/291]        layers.14.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[131/291]        layers.14.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[132/291]     layers.14.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[133/291]     layers.14.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[134/291]     layers.14.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[135/291]      layers.14.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[136/291]            layers.14.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[137/291]        layers.15.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[138/291]        layers.15.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[139/291]        layers.15.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[140/291]        layers.15.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[141/291]     layers.15.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[142/291]     layers.15.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[143/291]     layers.15.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[144/291]      layers.15.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[145/291]            layers.15.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[146/291]        layers.16.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[147/291]        layers.16.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[148/291]        layers.16.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.125 0.135 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[149/291]        layers.16.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[150/291]     layers.16.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[151/291]     layers.16.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.126 0.134 0.126 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[152/291]     layers.16.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[153/291]      layers.16.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[154/291]            layers.16.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[155/291]        layers.17.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[156/291]        layers.17.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[157/291]        layers.17.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[158/291]        layers.17.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[159/291]     layers.17.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[160/291]     layers.17.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.052 0.033 0.019 0.022 \n",
            "[161/291]     layers.17.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[162/291]      layers.17.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[163/291]            layers.17.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[164/291]        layers.18.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[165/291]        layers.18.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[166/291]        layers.18.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[167/291]        layers.18.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[168/291]     layers.18.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[169/291]     layers.18.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[170/291]     layers.18.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[171/291]      layers.18.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[172/291]            layers.18.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[173/291]        layers.19.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[174/291]        layers.19.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[175/291]        layers.19.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[176/291]        layers.19.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[177/291]     layers.19.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[178/291]     layers.19.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[179/291]     layers.19.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[180/291]      layers.19.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[181/291]            layers.19.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[182/291]        layers.20.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[183/291]        layers.20.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[184/291]        layers.20.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[185/291]        layers.20.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[186/291]     layers.20.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[187/291]     layers.20.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[188/291]     layers.20.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[189/291]      layers.20.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[190/291]            layers.20.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[191/291]        layers.21.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[192/291]        layers.21.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[193/291]        layers.21.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[194/291]        layers.21.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[195/291]     layers.21.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[196/291]     layers.21.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[197/291]     layers.21.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[198/291]      layers.21.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[199/291]            layers.21.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[200/291]        layers.22.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[201/291]        layers.22.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[202/291]        layers.22.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[203/291]        layers.22.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.124 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[204/291]     layers.22.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[205/291]     layers.22.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[206/291]     layers.22.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[207/291]      layers.22.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[208/291]            layers.22.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[209/291]        layers.23.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[210/291]        layers.23.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[211/291]        layers.23.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[212/291]        layers.23.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[213/291]     layers.23.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[214/291]     layers.23.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[215/291]     layers.23.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[216/291]      layers.23.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[217/291]            layers.23.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[218/291]        layers.24.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[219/291]        layers.24.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[220/291]        layers.24.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[221/291]        layers.24.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.124 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[222/291]     layers.24.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[223/291]     layers.24.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[224/291]     layers.24.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[225/291]      layers.24.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[226/291]            layers.24.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[227/291]        layers.25.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[228/291]        layers.25.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[229/291]        layers.25.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[230/291]        layers.25.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[231/291]     layers.25.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[232/291]     layers.25.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[233/291]     layers.25.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[234/291]      layers.25.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[235/291]            layers.25.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[236/291]        layers.26.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[237/291]        layers.26.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[238/291]        layers.26.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[239/291]        layers.26.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[240/291]     layers.26.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[241/291]     layers.26.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[242/291]     layers.26.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[243/291]      layers.26.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[244/291]            layers.26.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[245/291]        layers.27.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[246/291]        layers.27.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[247/291]        layers.27.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[248/291]        layers.27.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[249/291]     layers.27.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[250/291]     layers.27.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[251/291]     layers.27.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[252/291]      layers.27.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[253/291]            layers.27.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[254/291]        layers.28.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[255/291]        layers.28.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[256/291]        layers.28.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[257/291]        layers.28.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[258/291]     layers.28.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.132 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[259/291]     layers.28.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[260/291]     layers.28.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[261/291]      layers.28.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[262/291]            layers.28.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[263/291]        layers.29.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[264/291]        layers.29.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[265/291]        layers.29.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[266/291]        layers.29.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[267/291]     layers.29.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[268/291]     layers.29.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[269/291]     layers.29.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[270/291]      layers.29.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[271/291]            layers.29.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[272/291]        layers.30.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.126 0.134 0.125 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[273/291]        layers.30.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[274/291]        layers.30.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.077 0.053 0.033 0.019 0.022 \n",
            "[275/291]        layers.30.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[276/291]     layers.30.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[277/291]     layers.30.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.018 0.032 0.051 0.076 0.104 0.128 0.137 0.128 0.104 0.076 0.051 0.032 0.018 0.022 \n",
            "[278/291]     layers.30.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[279/291]      layers.30.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[280/291]            layers.30.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[281/291]        layers.31.attention.wq.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[282/291]        layers.31.attention.wk.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "[283/291]        layers.31.attention.wv.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.032 0.052 0.077 0.104 0.126 0.135 0.126 0.104 0.077 0.052 0.032 0.019 0.022 \n",
            "[284/291]        layers.31.attention.wo.weight - [4096 x 4096], type =    f16, quantizing .. size =    32.00 MB ->    10.00 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[285/291]     layers.31.feed_forward.w1.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.133 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[286/291]     layers.31.feed_forward.w2.weight - [11008 x 4096], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.021 0.018 0.031 0.050 0.075 0.104 0.130 0.140 0.130 0.104 0.075 0.050 0.031 0.018 0.021 \n",
            "[287/291]     layers.31.feed_forward.w3.weight - [4096 x 11008], type =    f16, quantizing .. size =    86.00 MB ->    26.88 MB | hist: 0.000 0.022 0.019 0.033 0.053 0.077 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "[288/291]      layers.31.attention_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[289/291]            layers.31.ffn_norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[290/291]                          norm.weight - [4096], type =    f32, size =    0.016 MB\n",
            "[291/291]                        output.weight - [4096 x 49954], type =    f16, quantizing .. size =   390.27 MB ->   121.96 MB | hist: 0.000 0.022 0.019 0.033 0.052 0.077 0.104 0.126 0.134 0.126 0.104 0.077 0.052 0.033 0.019 0.022 \n",
            "llama_model_quantize_internal: model size  = 13133.55 MB\n",
            "llama_model_quantize_internal: quant size  =  4104.93 MB\n",
            "llama_model_quantize_internal: hist: 0.000 0.022 0.019 0.033 0.053 0.078 0.104 0.125 0.134 0.125 0.104 0.078 0.053 0.033 0.019 0.022 \n",
            "\n",
            "main: quantize time = 135032.60 ms\n",
            "main:    total time = 135032.60 ms\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "### （可选）测试量化模型解码\n",
        "至此已完成了所有转换步骤。\n",
        "我们运行一条命令测试一下是否能够正常加载并进行对话。\n",
        "\n",
        "FP16和Q4量化文件存放在./llama.cpp/zh-models/7B下，可按需下载使用。"
      ],
      "metadata": {
        "id": "DLkuRAo9Vkb1"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "!cd llama.cpp && ./main -m ./zh-models/7B/ggml-model-q4_0.bin --color -f ./prompts/alpaca.txt -p \"详细介绍一下北京的名胜古迹：\" -n 512"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "tW-ep1BsVQtG",
        "outputId": "70110923-e42e-4c5e-e329-86cec4205ec5"
      },
      "execution_count": 9,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "main: seed = 1681208198\n",
            "llama.cpp: loading model from ./zh-models/7B/ggml-model-q4_0.bin\n",
            "llama_model_load_internal: format     = ggjt v1 (latest)\n",
            "llama_model_load_internal: n_vocab    = 49954\n",
            "llama_model_load_internal: n_ctx      = 512\n",
            "llama_model_load_internal: n_embd     = 4096\n",
            "llama_model_load_internal: n_mult     = 256\n",
            "llama_model_load_internal: n_head     = 32\n",
            "llama_model_load_internal: n_layer    = 32\n",
            "llama_model_load_internal: n_rot      = 128\n",
            "llama_model_load_internal: f16        = 2\n",
            "llama_model_load_internal: n_ff       = 11008\n",
            "llama_model_load_internal: n_parts    = 1\n",
            "llama_model_load_internal: model size = 7B\n",
            "llama_model_load_internal: ggml ctx size =  59.11 KB\n",
            "llama_model_load_internal: mem required  = 5896.99 MB (+ 1026.00 MB per state)\n",
            "llama_init_from_file: kv self size  =  256.00 MB\n",
            "\n",
            "system_info: n_threads = 40 / 40 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | \n",
            "sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000\n",
            "generate: n_ctx = 512, n_batch = 8, n_predict = 512, n_keep = 0\n",
            "\n",
            "\n",
            "\u001b[33m 详细介绍一下北京的名胜古迹：\u001b[0m\n",
            " 故宫是北京最着名的古迹，也是中国最著名的旅游景点之一。 [end of text]\n",
            "\n",
            "llama_print_timings:        load time =  2984.28 ms\n",
            "llama_print_timings:      sample time =    35.30 ms /    23 runs   (    1.53 ms per run)\n",
            "llama_print_timings: prompt eval time =  3215.45 ms /    11 tokens (  292.31 ms per token)\n",
            "llama_print_timings:        eval time = 10391.94 ms /    22 runs   (  472.36 ms per run)\n",
            "llama_print_timings:       total time = 13867.30 ms\n"
          ]
        }
      ]
    }
  ]
}